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Abstract 


Cross-matching is a key technique to achieve fusion of multi-band astronomical catalogs. Due to different 
equipment such as various astronomical telescopes, the existence of measurement errors, and proper motions of the 
celestial bodies, the same celestial object will have different positions in different catalogs, making it difficult to 
integrate multi-band or full-band astronomical data. In this study, we propose an online cross-matching method 
based on pseudo-spherical indexing techniques and develop a service combining with high performance computing 
system (Taurus) to improve cross-matching efficiency, which is designed for the Data Center of Xinjiang 
Astronomical Observatory. Specifically, we use Quad Tree Cube to divide the spherical blocks of the celestial 
object and map the 2D space composed of R.A. and decl. to 1D space and achieve correspondence between real 
celestial objects and spherical patches. Finally, we verify the performance of the service using Gaia 3 and PPMXL 
catalogs. Meanwhile, we send the matching results to VO tools-Topcat and Aladin respectively to get visual 
results. The experimental results show that the service effectively solves the speed bottleneck problem of cross- 
matching caused by frequent I/O, and significantly improves the retrieval and matching speed of massive 
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astronomical data. 
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1. Introduction 


The cross-matching calculation is the basis for the fusion of 
multi-band astronomical observations and a key technique for 
multi-band astronomy research. It can realize the fusion of 
astronomical data from different bands to obtain multi-band or 
all-band data, which is beneficial for astronomers to reveal 
celestial information and better use the various data in the 
catalog for scientific research (Yu et al. 2019). With the rapid 
development of astronomical technology, many countries have 
built or plan to build telescopes covering multiple bands. For 
instance, (i) in the radio band, Square Kilometre Array 
(Dewdney 2008), Five-hundred-meter Aperture Spherical radio 
Telescope (Nan et al. 2011), Robert C. Byrd Green Bank 
Telescope (Prestage et al. 2009), and the upcoming QTT (QiTai 
radio Telescope) under construction (Wang et al. 2023; Zhang 
et al. 2023a), etc. (ii) in optical band, European Extremely 
Large Telescope (Gilmozzi & Spyromilio 2007), Large 
Synoptic Survey Telescope (Zhan & Tyson 2018), Large 
sky Area Multi-Object fiber Spectroscopic Telescope (Cui 
et al. 2012), etc. (iii) in other band, Lunar-based Ultraviolet 
Telescope (Wang et al. 2015), Cherenkov Telescope Array 
(Acharya et al. 2017), extended ROentgen Survey with an 


Imaging Telescope Array (Predehl et al. 2021), etc. It can be seen 
that astronomy has entered the big data and full-band era (Cui 
et al. 2020), and the measurement errors of various astronomical 
telescopes have led to different data obtained from observing the 
same celestial object, causing some difficulties in integrating 
multi-band or full-band astronomical data. 

The Data Center of Xinjiang Astronomical Observatory 
(XAO-DC) was built in 2015 (Zhang et al. 2022), the main 
data sources include Nanshan 26 m Radio Telescope (NSRT; Xu 
et al. 2018) and Nanshan One meter Wide-field Telescope 
(NOWT; Bai et al. 2020). It provides online retrieval services for 
pulsar, molecular spectrum, active galactic nuclei, and NOWT 
data sets (Zhang et al. 2019). In order to facilitate astronomers to 
better use the data in the astronomical catalogs for scientific 
research, we develop a cross-matching service for XAO-DC. 

The service features that we developed can be summarized 
as follows: 


(i) We implement pseudo-spherical sky partition, which 
divides the whole sky sphere into ~6 x 4°° approximately 
equal blocks to accurately locate the required data and 
reduce unnecessary data reading, thereby reducing disk 
I/O. 
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(ii) The service improves the speed of cross-matching by 
using pseudo-sphere index technology and parallel 
computing technology, so that the time consumption of 
astronomical catalogs cross-matching of tera-scale is less. 

(iii) Experimental results show that our online cross-matching 
service achieves 4 trillion cross-matching computation 
results in less than one second. 


The rest of this paper is organized as follows. In Section 2, 
some related works about cross-matching calculation are 
introduced. The developed cross-matching service, which is 
the core of this paper, is presented in detail in Section 3. In 
Section 4, real astronomical catalogs are tested and the 
experimental results are verified. Finally, Section 5 concludes 
the paper. 


2. Related Work and Background 
2.1. Related Work 


The astronomical catalogs contain a variety of celestial 
parameters, collecting data obtained by the telescope during a 
specific period of astrometry. Nowadays, computer experts in 
many countries are studying the method of astronomical 
catalogs cross-matching, and have developed some tools or 
algorithms. Budavari & Szalay (2007) have nicely formulated 
cross-matching in a Bayesian framework for improving the 
speed, and it is a solid theoretical foundation and improving 
recall and precision. Pineau et al. (2011) have developed an 
efficient and scalable cross-matching service for (very) large 
catalogs, and it supported customized cross-matching opera- 
tions. VizieR (Ochsenbein et al. 2000) designed by the Centre 
de Données de Strasbourg (CDS), includes the cross-matching 
of astronomical observations and large catalogs, which can be 
performed by uploading directory files and astronomical 
catalogs in the tool. SIMBAD (Wenger et al. 2000) provides 
multi-source query for small files of astronomical catalogs, 
which is based on cross-matching of small astronomical 
catalogs. Many different options can be selected during 
cross-matching, such as the type of source. Xmatch (Budavari 
& Lee 2013) is one of a wide range of cross-matching tools, 
which integrates data sets of many observatories, such as 
2MASS, GSC, GALEX, UCAC, WISE, etc. It can provide 
many functions such as download, query and integration of 
astronomical tables. ARCHES (Motch et al. 2016) is a cross- 
matching service for high-energy astrophysics research, which 
provides multi-band data with complete characteristics in the 
form of spectral energy distribution. Astronomers can submit 
their own retrieval script through HTTP API, and the system 
will send astronomers the results of cross-matching after the 
script is run. catsHTM (Soumagnac & Ofek 2018) uses HTM 
index to store hierarchical astronomical catalogs in HDF5 files, 
integrates DECaLS/DR5, FIRST, Gaia/DR1, Gaia/DR2, 
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GALEX/DR6Plus7 and other data sets, and it can support 
cross-matching between dozens of astronomical catalogs. 

To speed up cross-matching calculation, Pei et al. (2011) 
greatly improved the speed of cross-matching using Python 
multi-core parallel method. Zhao et al. (2009) used HEALPix 
to divide the astronomical catalogs, combined with the bit 
operation fast index, and controlled the cross-matching time of 
large-scale astronomical catalogs within 32 minutes. Du et al. 
(2014) combined two partition indexing methods, HTM and 
HEALPix, and used thread pool technology to accelerate the 
cross-matching time. They reduced the cross-matching time of 
large-scale astronomical catalogs to 23 minutes, and controlled 
the cross-matching time of medium-sized astronomical catalogs 
to 7 minutes. Ma et al. (2018) proposed E-Zone algorithm, 
which uses Euclidean distance for faster calculation of adjacent 
points, and implements parallel calculation based on OpenMP. 
Li et al. (2019) designed a multi-band catalog unified format, 
combined with the data layout strategy of minimum conflict to 
improve the parallelization of cross-matching, and achieved 
30.3% and 30.7% time reduction compared with Quad Tree 
Cube (Q3C) and HealpiX-tree-C (H3C) at 200 million data 
sources of astronomical catalogs. Zhang et al. (2023b) proposed 
a large-scale cross-matching framework supporting heteroge- 
neous computing, which reduced the cross-matching time to 5 s 
for small-scale astronomical catalogs, 150 s for medium-scale 
astronomical catalogs, and 260s for large-scale astronomical 
catalogs. 


2.2. The Cross-matching Based on Celestial Coordinates 


The cross-matching calculation of astronomical catalogs can 
combine various information, such as location, density, 
luminosity, wavelength, and so on. We choose to combine 
with celestial coordinates because catalogs obtained by 
different telescopes all contain information about the location 
of celestial sources. Therefore, we can determine whether two 
catalogs are homologous or non-homologous by comparing the 
information of celestial coordinates. As shown in Figure 1, the 
two points A and B come from astronomical catalogs A and B, 
respectively. When the spherical distance d < 34r? + r? (in 
theory), where rı and r» are the error radius of the two catalogs, 
the two points are successfully matched as the same object. 
When implemented on the web side, we provide Search radius 
options, users can enter matching radius according to actual 
needs, the output condition is that the distance between two 
points in the input catalog and the matching catalog is less than 
Search radius. 


3. A Cross-matching Service for XAO-DC 


3.1. The Overall Design of the Service 


We develop an online cross-matching service based on the 
German Astrophysical Virtual Observatory DaCHS 
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ID RA Dec = 
1 r1 di x 


Figure 1. Cross-matching calculation between two astronomical catalogs. 


(Demleitner et al. 2014) for the massive astronomical catalogs 
in XAO-DC. The overall structure of the service is shown in 
Figure 2. The services were decomposed into (from top to 
bottom) data layer, calculation layer, and output layer. 


Q) The data layer. Astronomers upload astronomical 
catalogs that need to be cross-matching in two ways, 
via remote URL or local upload as VOtable files. We 
provide Web, VO tools and Python scripts in three ways 
to obtain the archived astronomical catalogs of XAO-DC. 
By 2023 April, we have archived 20 astronomical data 
catalogs, with catalogs of pulsars, molecular spectra, and 
active Galactic nuclei from NSRT and catalogs of One- 
Meter Telescope from NOWT. All astronomical catalogs 
in XAO-DC are backed up at the headquarters of XAO 
and Nanshan station. 

Gi) The calculation layer. The layer is the core part of the 
whole service, which uses parallel computing techniques 
for cross-matching calculation. We use the celestial 
coordinates for cross-matching calculation, that is, 
calculating the angular distance between two astronom- 
ical catalogs. Theoretically, when the angular distance 


d < 34r + r2, where r; and ry are the error radius of 


the two catalogs, the matching of astronomical catalogs 
are successful. Actually, we calculate d in terms of 
d X Search radius. In order to improve the speed of 
cross-matching, we use a high performance computing 
system, which was built in 2016 and named Taurus 
(Zhang et al. 2018). 


(iii) The output layer. The layer provides a variety of output 


formats, such as CSV, HTML, FITS, JSON, etc. 
Astronomers output and download results of cross- 
matching according to actual scientific needs. Through 
the Simple Application Messaging Protocol (SAMP), the 
results obtained by cross-matching are sent to the 
standard virtual observatory tool to integrate data 
visualization and other related tools, supporting astron- 
omers to customize processing of cross-matching calcul- 
ation and complete the whole process of scientific 
research and analysis online. 


3.2. Indexing Strategy for Astronomical Catalogs 


We use Q3C index technology (Koposov & Bartunov 2006) 
to improve the retrieval efficiency, which is designed for 
PostgreSQL open source database. There are several reasons 
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Figure 2. Overview of online cross-matching service. 


Table 1 
Astronomical Catalogs in XAO-DC (The statistics are available through 2023 April 30) 
Catalogs Wave band Count URL 
ppmxl.main Optics 910468688 http: / /data.xao.ac.cn/ppmxl/q/cone/form 
gaia.dr3lite Optics 1811709771 http: //data.xao.ac.cn/gaia/q3/cone/form 


for using Q3C: (i) it is optimized for cone search, cross- 
matching and other technologies, because it uses central 
projection to reduce a lot of trigonometric function calculation, 
thus reducing the search time; (ii) it is an open source solution 
and can be downloaded from http: //sourceforge.net/projects / 
q3c; (iii) it guarantees the best I/O performance for retrieving 
data from the database. As shown in Figure 3, we assume the 
celestial sphere is a cube, construct a quadtree on each face of 
the cube, and use the quadtree structure to generate two- 
dimensional coordinate codes (or positive integer codes). Since 
the initial cube has six faces, the mapping to faces can be 
encoded using a 3-bit binary number. This partition is easily 
implemented by projecting the surface center of the cube onto 


the sphere, and the quadtree structure can be automatically 
inherited by the sphere. Ultimately, the sphere is divided into 
several quadrilaterals by different levels of partition. 


4. Performance of the Cross-matching Service 


4.1. Archived Astronomical Catalogs for Cross-matching 
Service in XAO-DC 


We have completed the archiving of observation data of the 
NSRT and the NOWT, including four data sets, namely pulsar 
data set, molecular spectral line data set, active galactic nuclei 
data set and NOWT data set (See Table 1 for the details of each 
data set). Larger catalogs that can be matched against include 
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Released Astronomical Catalogs of XAO-DC 


Figure 3. Indexing strategy using Q3C. 


Table 2 
Input Fields 
Name Table Head Description 
fileSrc Local file A local file to upload (overrides remote table if 
given). 

SR Search radius Search radius in cross-matching. 
tableName Target Table Name of the table to match against. 
urlSrc Remote URL A URL for a table to cross-matching. 


Gaia, 2MASS, USNO-B, PPMXL, and more. We use a server 
with Intel(R) Xeon(R) Silver 4210R CPU @ 2.40 GHz 72, 
256 GB memory, 4 TB *2 SSD and 16 TB*60 SATA for online 
cross-matching experiments. 


4.2. Use Case for Cross-matching Service 


4.2.1. Input Fields 


As shown in Table 2, the fields are available to provide input 
to the service. The uploaded VOTables must have exactly one 
pair of columns with UCDs of either pos.eq.[ra|dec]; meta.main 


Table 3 
Experimental Use Case 
Parameter Value Parameter Value 
Target Table ppmxl.main Limit to 10,000 
Search radius 0?001 Output format HTML 


Remote URL http://210.73.36.111/static/cross match 4000 


or POS EQ [RA[DEC] MAIN. The results of VO cone 
searches work well. If users have tables of their own, they 
will first have to bring them to the VOTable format. We 
currently do not support the transformation of their coordinates, 
so users have to make sure that the input coordinates match the 
System used in the table (for basically all of our tables, this 
means ICRS or FK5 J2000 to an accuracy sufficient for 
matching). We provide an experimental use case, shown in 
Table 3, that tests through url: http: //data.xao.ac.cn/cross/q/ 
match/form. 
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0. 006 


Search radius 


Figure 4. The influence of search radius on experimental results. 


Table 4 
Cross-matching Results 


E raepra (deg) 


3.69000009e-05 
4.52999993e-05 
4.52999993e-05 
3.69000009e-05 
4.52999993e-05 
3.69000009e-05 
4.52999993e-05 
4.52999993e-05 
4.52999993e-05 
4.52999993e-05 
2.42000006e-05 
2.47000007e-05 
4.52999993e-05 
4.52999993e-05 
4.52999993e-05 
3.69000009e-05 
3.69000009e-05 
3.69000009e-05 
4.52999993e-05 
3.18999992e-05 
2.08000001e-05 
2.75000002e-05 


3.69000009e-05 


14000 
12000 LA Number of matched 
LL Browser response time (ms) 
—e— Brows sponse s à 

10000 Br ser resp nse size (KB) 

8000 

6000 

4000 

2000 

0 
0. 000 0. 002 0. 004 

ID R.A. (deg) Decl. (deg) 
1270486784963202545 335.159448 —28.396266 
1271723434119163981 336.874314 —28.562427 
1272238947138844943 338.898195 —27.889961 
1272238947067022759 338.898547 —27.889473 
1273186315633093107 335.85566 —25.841385 
1273186498931695147 335.856555 —25.841449 
1276884716561952497 340.584908 —21.463612 
1279886786271825914 348.146 —35.049952 
1289552211362865684 352.075615 —31.208315 
1289552211671667337 352.075928 —31.207844 
1289979501643796662 350.594732 —29.149134 
1293170549451374502 353.934296 —30.428043 
1293170549479084174 353.933957 —30.427821 
1295566798075471614 354.101817 —27.034899 
1295566799485120388 354.102276 —21.034219 
1296263507508196424 359.165515 —29.293381 
1296263507525977872 359.165854 —29.293347 
1299058494154989044 317.99219 —18.645028 
1299163679116059348 317.990165 —18.347378 
1299163679032991356 317.990235 —18.347413 
1300053359303298668 315.403934 —15.115888 
1300236670317797321 316.783602 —16.254174 
2130664077477512878 33.374175 13.021185 
2132243439030599117 36.323942 12.778484 


4.52999993e-05 


E_deepde (deg) 


3.69000009e-05 
4.52999993e-05 
4.52999993e-05 
3.69000009e-05 
4.52999993e-05 
3.69000009e-05 
4.52999993e-05 
4.52999993e-05 
4.52999993e-05 
4.52999993e-05 
2.42000006e-05 
2.47000007e-05 
4.52999993e-05 
4.52999993e-05 
4.52999993e-05 
3.69000009e-05 
3.69000009e-05 
3.69000009e-05 
4.52999993e-05 
3.18999992e-05 
2.08000001e-05 
2.75000002e-05 


3.69000009e-05 
4.52999993e-05 
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14000 
12000 
10000 
8000 
6000 
4000 
2000 


Others 


Decl. 


—28.3958107 
—28.563134 

—27.8893547 
—27.8893547 
—25.8415443 
—25.8415443 
—27.4637784 
—35.0499171 
—31.2081914 
—31.2081914 

—29.14918 

—30.4280689 
—30.4280689 
—27.0341224 
—27.0341224 
—29.2931233 
—29.2931233 
—18.6451978 
—18.3476311 
—18.3476311 
—15.1159916 
—16.2542654 


13.0209357 
12.7788668 


4.2.2. Output Result 


We obtain the following matched data and corresponding 
parameter information, including R.A. [deg], decl. [deg], 


E_raepra [deg], etc, as shown in Table 4 for details of the 
cross-matching results. At the same time, the result of cross- 
matching between ppmxl.main (~1 billion targets) and test 
catalog (4000 targets) takes less than one second. That 
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(c) (d) 
Figure 5. Cross-matching results were obtained by combining TOPCAT, where (a) search radius = 0°001; (b) search radius = 02004; (c) search radius = 02007; (d) 


search radius = 07010. The x-coordinate is R.A. [deg] and the y-coordinate is decl. [deg]. The rest of the visual results can be accessed from our cross-matching service 
(URL: http:/ /data.xao.ac.cn/cross/q/match/form), please click “Send via SAMP” after obtaining the matched result. 


means, our online cross-matching service achieves 4 trillion above works, it is impossible to achieve the same scale of 
cross-matching computation results in less than one cross-matching as in this paper. 

second. As far as we know, Gao et al. (2008) took 

407 minutes (811117 x 470992970); Zhao et al. (2009) 


took 32 minutes (470992970 x 100106811); Pei et al. (2011) see ANE TEMES AI SOEGROR CHAR 

took 10 minutes (470992970 x 100106811); Du et al. (2014) As shown in Figure 4, We exhibit the corresponding 
took 7 minutes (946464 x 470992970). Because there is no relationship among Number of matched, Browser response 
online platform for testing the methods implemented in the time and Browser response size. As the search radius 
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20/06 
18:00 
16:00 ~ 
14:00 


12:00 


(b) 


Figure 6. Cross-matching results were obtained by combining TOPCAT, where (a) search radius = 0°001; (b) search radius = 07010. The x-coordinate is R.A. [deg] 
and the y-coordinate is decl. [deg]. The rest of the visual results can be accessed from our cross-matching service (URL: http:/ /data.xao.ac.cn/cross/q/match/form), 
please click “Send via SAMP” after obtaining the matched result. Because of the limited space, we present only two results; the rest results can be verified in our cross- 
matching service. 


diminishes (from 0°010 to 02001), the Number of matched 4.3. Use Case for Cross-matching Joint Virtual 
(from 12,997 to 757), Browser response time (from 13,640 to Observatory Tools 

503 ms), and Browser response size (from 1228.8 to 89.5 KB) In practice, do not use a web browser for cross matching. 
decreased accordingly. This shows that the smaller the search Instead, obtain a TAP client (e.g., TOPCAT or py VO), load the 
radius, the more accurate the cross-matching results. table to be matched into the client and then run a query like 
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(Dowler & Demleitner 2019). Since the backend is the same, 
the performance characteristics are as for the browser service 
discussed above. 


4.3.1. TOPCAT 


TOPCAT? is a browser and editor that can interactively 
graph tables of astronomical data in major formats such as 
FITS and VOTable. In order to facilitate astronomers to 
analyze data, we can send the cross-matching results to 
TOPCAT? through SAMP protocol, as shown in Figure 5. 


4.3.2. Aladin 


Aladin’ is a free, interactive astronomy software that enables 
astronomers to interactively retrieve digitized astronomical images 
from the astronomical catalogs of all known celestial objects, such 
as Simbad and VizieR, and visually compare them with DSS, 
PanSTARRS and other astronomical catalogs. To facilitate 
astronomers to analyze data, we can send the cross-matching 
results to Aladin? through SAMP protocol, as shown in Figure 6. 


5. Conclusion 


In this paper, we proposed an online cross-matching method 
based on pseudo-spherical indexing techniques and developed a 
service combining with Taurus for the XAO-DC to improve cross- 
matching efficiency. This service supports two source table file 
input modes: local upload and URL; file input supports the standard 
VOTable format, and realizes the cross-matching calculation 
between the uploaded astronomical catalogs and the released 
astronomical catalogs in the XAO-DC. At the same time, it 
supports HTML, CSV, FITS, JSON and other data output modes, 
and integrates necessary visualization tools (such as TOPCAT, 
Aladin, etc.) according to the related protocols of the virtual 
observatory to support the processing and customization of the data 
after cross-matching. The service provides astronomers with 
reliable and convenient technical support, which is intended to 
help them further their astronomical research. 
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